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Abstract: 

The thermodynamic behavior of a three-dimensional off-lattice model for protein folding is probed. 
The model has only two types of residues, hydrophobic and hydrophilic. In absence of local inter- 
actions, native structure formation does not occur for the temperatures considered. By including 
sequence independent local interactions, which qualitatively reproduce local properties of functional 
proteins, the dominance of a native state for many sequences is observed. As in lattice model ap- 
proaches, folding takes place by gradual compactification, followed by a sequence dependent folding 
transition. Our results differ from lattice approaches in that bimodal energy distributions are not 
observed and that high folding temperatures are accompanied by relatively low temperatures for the 
peak of the specific heat. Also, in contrast to earlier studies using lattice models, our results con- 
vincingly demonstrate that one does not need more than two types of residues to generate sequences 
with good thermodynamic folding properties in three dimensions. 
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1 Introduction 



In the process of unveiling central issues in the thermodynamics and kinetics of protein folding, 
simplified models where the amino acid residues constitute the basic entities seem to exhibit many 
non-trivial and interesting properties Q . In particular lattice model approaches with contact inter- 
actions only have become increasingly popular. The lattice and contact term approximations may 
seem drastic. Nevertheless, it turns out that such simple models are able to catch non-trivial aspects 
of the folding problem. This is interesting and encouraging. However, it does not imply that the 
approximations involved are understood. It is therefore crucial to pursue the study of alternative 
models, e.g. exploring off-lattice models with Lennard- Jones interactions. 

The major advantage of lattice models is computational efficiency, at least for small chain sizes. 
With improved algorithms and faster computers this advantage is losing in importance. In this 
paper we propose and study an extension to three dimensions (3D) of a two-dimensional (2D) 
off-lattice model that was successfully studied in Refs. (|, [|. This model contains only two 
types of amino acids, hydrophobic and hydrophilic, and the key part of the energy function is 
a species-dependent term that favors the formation of a hydrophobic core. However, as will be 
demonstrated below, this term alone is not sufficient in order to have thermodynamically dominant 
native states. This observation is reminiscent of existing lattice model results || |(|, and could be 
taken as an indication that it is essential to have more than two different types of amino acids. In 
this paper we explore another possibility; do the folding properties depend on species-independent 
local interactions sticking to two amino acid types only? It should be noted that both lattice and 
2D off-lattice models implicitly contain local interactions. In the lattice case the mere presence of a 
discretized space "stiffens" the dynamics locally, and in two dimensions the continuum movements 
are hampered by compressing one dimension. 

The purpose of this work is to construct and numerically study a 3D generalization of the 2D model 
of Refs. H |J. As design criteria for such a model we have: 

• The model should give rise to thermodynamically dominant states - i.e. be biologically plau- 
sible from a stability point of view. 

• The local interactions should be chosen to at least qualitatively reproduce bond and torsional 
angle distributions and local correlations found in functional proteins. 

We propose a simple form for the local interactions, which are found to play an important role. 
Without any local interaction in the model, the local structure of the chains is much more irregular 
than for proteins. It turns out that one can obtain a local structure qualitatively similar to that of 
proteins, by adjusting the strength of the local interactions. Having chosen the local interactions in 
this way, we reexamine the overall thermodynamic behavior. We find that not only have the local 
properties improved, but the native states have also become more stable. 

Furthermore, we examine the structure formation and the properties of the folding transition by 
studying the temperature dependence of various thermodynamic and structural variables, and the 
distributions of these quantities at the folding temperature. These results can be understood in terms 
of a gradual collapse to compact structures where the (sequence dependent) folding transition occurs. 
The qualitative aspects of this picture confirm what has been found in lattice model calculations 
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jl], p. However, it turns out that our results differ from lattice approaches in that bimodal energy 
distributions are not observed and that high folding temperatures are accompanied by relatively 
low temperatures for the peak of the specific heat. 

The paper is organized as follows. In Sec. 2 the 3D model is defined and in Sec. 3 we extract local 
properties and correlations for functional proteins from the PDB data base |J. Monte Carlo (MC) 
methods and measurements are described in Sec. 4. The results and the summary are found in 
Sec. 5 and 6 respectively. 



2 The Model 



We start by defining the simplified geometry of the model. Each residue is represented by a single 
site located at the C Q position. These sites are linked by rigid bonds of unit length, bi, to form 
linear chains living in three dimensions. The shape of an N-mer is either specified by the N — 1 
bond vectors 6, or by N — 2 bond angles Tj and N — 3 torsional angles ccj (see Fig. |l|). 




Figure 1: Definition of bi, tj and a*. 



The model contains two kinds of residues, A and B, which behave as hydrophobic (cr,=+l) and 
hydrophilic (<Tj = — 1) residues respectively. The energy function is given by 

JV-2 N-3 AT-2 N 

E(b; a) = -k x ^ &« ' ~ K 2^ b i ' ^+2 + X! X! E ™{nj;Pi,<?j) (1) 

i— 1 z—1 i— 1 j—i+2 

where = (r^+i, . . . , 7j-_i; a^+i, . . . , aj-2) denotes the distance between residues i and j, and 
<7i,...,crjv is a binary string that specifies the primary sequence. The species-dependent global 
interactions are given by the Lennard-Jones potential, 

E^ir^G^aj) = 4e(cr J , o-j)(4y - i) . (2) 

ij ij 
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The depth of the minimum of this potential, e(<Tj, <Tj), is chosen to favor the formation of a core of 
A residues 

r i aa 

e(ai,aj)=i (3) 
[ i BB, AB 

The two parameters of the energy function, K\ and k 2 , determine the strength of species-independent 
local interactions. The model will be explored for different values of these two parameters, and our 
final choice will be k\ = —1 and K2 = 0.5. 

The behavior of the model at finite temperature T is given by the partition function 

exp(-£(S;a)/T) . (4) 

Let us stress that the interactions in Eq. [L] are not chosen so as to stabilize the native state of some 
particular sequence. Rather, our goal is to study general sequences for a given energy function. We 
have attempted to choose this energy function as simple as possible. Anticipating some of results 
to be presented in Sec. 5, let us here briefly discuss the relevance of the different interaction terms. 

Species-dependent interactions. It is obvious that the global interactions play a key role in the 
model; these interactions are responsible for the compactification of the chain, and for the formation 
of a hydrophobic core. For functional proteins, we find that probability distributions of bond and 
torsional angles depend only weakly on the hydrophobicity pattern for the residues involved. In our 
model, the form of these distributions is very sensitive to the choice of global interactions, and it 
can be strongly sequence dependent. In order to avoid this, we have chosen the potentials for the 
three different types of residue pairs to be fairly similar; they are all attractive at large separations, 
and the location of the minimum is the same. 

Species-independent interactions. It is less clear how important the local interactions are. 
When studying the behavior of several different sequences in the absence of the local interactions 
(ki = «2 = 0), the stability of the native state tends to be very low. Furthermore, we find that local 
correlations along such chains are weak, which is in line with the findings of Ref. ]To| . For functional 
proteins the corresponding correlations are fairly strong, which is a manifestation of the presence 
of secondary structure. For these reasons we decided to incorporate the local interactions in the 
model. In this way one gets stiffer chains, which implies stronger local correlations. In addition, 
the stability of the native states tends to improve, as we will see below. 

We have studied the model for many different choices of n\ and Ki- The general behavior of the 
system is fairly insensitive to small changes of these parameters. Below we will focus on the results 
obtained for the three choices K2) = (0,0),(-l,0) and (-1,0.5). 



3 Local Structures in Functional Proteins 

In a 3D off-lattice model, it is possible to check the local properties against those for functional 
proteins in a direct way. We have probed the local structure in two different ways. First we consider 
the distributions of bond and torsional angles. Second we study local correlations along the chain. 



Z(T;a) = J ]Jdb 



3 



0.03 



P((X; 



0.00 




CD 

o 
d 



o 
o 



90 180 270 360 



a. 









150 








100 k. ' 




50 








150 



100 



- 50 



90 180 270 360 



Figure 2: Bond (ji) and torsional (oti) angle distributions for functional proteins. 



In this section we describe the results one finds for functional proteins. These results will be used 
in Sec. 5 to compare with the qualitative local behavior of our model. 



Distributions of bond and torsional angles. These distributions for functional proteins are 
well-known JO], [l^] and arc included here for completeness. We consider the structure defined by 
the backbone of C" atoms. The results were obtained using a set of 505 selected sequences Q[| 
from the Protein Data Bank Q . This set was obtained by allowing for a maximum of 25% sequence 
similarity for aligned subsequences of more than 80 residues jl4| . Within this set of 505 minimally 
redundant sequences, 491 contained complete backbone information; the others were excluded from 
our analysis. 



In Fig. g the distributions of bond and torsional angles are shown together with a scatter plot for 
these two quantities. Note that the calculation of the torsional angle requires four consecutive 
C a atoms, while the bond angle t% requires only three. The additional fourth C" atom needed for 
on is taken in the N-terminus direction. 

From Fig. || it is evident that there are strong regularities in the local structure. The t; and 
oti distributions both exhibit a clear two-peak structure, which can be associated with two well- 
populated regions in the (Tj, ai)-plane. One of these regions, Tj £ [85°, 100°] and a.i £ [35°, 70°], 
corresponds to right-handed a-helix and the other, r t £ [105°, 145°] anda^ £ [170°, 250°], corresponds 
to /3-sheet. 



5 The May 1996 edition was used. 
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Local correlations. We study local correlations using the function 

N-d-l 

Cb{d) = N-d-1 2 hl ' il+d (5) 

i=l 

where bi are normalized (virtual) bond vectors. Local correlations along protein chains have been 
studied by Ref. |T0], using a correlation function slightly different from that in Eq. ||. In Fig. [||a 
we show the correlation function Cb(d) for functional proteins. As can be seen from this figure, 
there are significant correlations at least out to separations of about eight residues. The oscillations 
present can be related to the presence of a-helix structure, which has a period of 3.6. 




Figure 3: The correlation function C&(d) for functional proteins (see Eq. 



4 Methods 



4.1 Monte Carlo Method 



Numerical simulations have been performed for a variety of different choices of («i, K2) and different 
sequences and temperatures. At low temperatures conventional Monte Carlo methods tend to 
become extremely time-consuming, due to the presence of high free-energy barriers. As in Refs. §, § 
we have therefore chosen to employ the dynamical-parameter method, which means that some 
parameter of the model is taken as a dynamical variable which takes values ranging over a definite 
set. In this way it is possible to greatly improve the frequency of transitions between the different 
free-energy valleys; for the 2D model studied in Ref. H speedup factors of 10 3 -10 4 were observed. 

In the present work the temperature is treated as a dynamical variable ("simulated tempering" 
fl5|l ). More precisely, the joint probability distribution 

P(b, k) cx ex.p{-g k - E{b, &)/T k ) (6) 



5 




1000000 2000000 3000000 4000000 

iterations 

Figure 4: Evolution of the quenched (diamonds) and unquenched (line) energies in the simulation 
of sequence 1 (see Table 1). Measurements were taken every 10 iterations. Shown are the data 
corresponding to the lowest allowed temperature. The thermalization phase of 50000 sweeps is not 
shown. 

is simulated, where T/., k = 1, . . . , K, are the allowed values of the temperature. The g^s are tunable 
parameters which must be chosen carefully for each sequence. We refer the reader to Refs. || Q for 
details on how to determine these parameters. The joint distribution P(b, k) is simulated by using 
separate Metropolis steps Jl6| in k and b. For b we use two types of elementary moves: rotations of 
single bonds and moves of pivot type Jl7t • 

In our simulations we used a set of K = 25 allowed temperatures, which are equidistant in 1/T and 
ranging from 0.15 to 3.0. 

In order to study the energy level spectrum we use a quenching procedure; in the course of the 
simulations the system is quenched to zero temperature at regular intervals. For this purpose we 
employ a conjugate gradient method. We found this method more efficient than using a Monte Carlo 
algorithm with T = 0. Also, we tested two different conjugate gradient algorithms. In the conjugate 
gradient method successive minimizations are carried out along different lines. Information about 
the gradient is needed to define the lines, and may or may not be utilized for the minimizations. 
For the present problem we found that minimization without gradient calculation is faster. The 
quenching procedure only accounts for a small portion of the total computing time. 

In Fig. |J we show the evolution of the quenched and unquenched energies in a typical simulation of 
a N = 20 chain. 



4.2 Measurements 

In Sec. 3 we discussed measurements of local properties of the chains. In order to study the sta- 
bility of the full native structure, further information is needed. To this end we have studied the 
distribution of the mean-square distance between configurations, S^ b . For two configurations a and 
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b, 5l b is defined as 

i N 

^ = min^£|^-*f| 2 (7) 



i=l 

where — xf \ denotes the distance between the sites and x[ b \ and where the minimum is 
taken over translations, rotations and reflections. The corresponding distribution, P(6 2 ), for fixed 
temperature and sequence, reads 

P{S 2 ) = 1 J dU a Ub^S(S 2 - ^ 6)e -B(gW;a)/T) e - B (6(« iCT )/T) (g) 

where S(-) denotes the Dirac delta function. For convenience, we often use the mean 

(S 2 ) = J d6' 2 P{6' 2 )6' 2 (9) 
rather than the full distribution P(S 2 ). 

We have also measured the specific heat Cy and gyration radius r syr , defined by 

Cv = ±{(E 2 )-{E) 2 ) (10) 



i=l 

The specific heat has a maximum in the vicinity of the folding transition. To accurately determine 
the height and location of this maximum we use the multihistogram method |l8[ fl9| . 

In addition to these measurements, we employ the quenching procedure described above. Removing 
the thermal noise in this way is of great help in monitoring the evolution of the system, but requires 
a substantial amount of computer time. Therefore, we have performed these calculations at larger 
intervals than other measurements, and only at the lowest of the temperatures studied. 

The quenching procedure provides us with a set of low-lying local energy minima. For some of the 
studied chains wc believe that the lowest minimum obtained in our simulation is the global minimum 
of the energy function, as will be discussed below. The mean-square distance to the global minimum 
will be denoted by Sq (cf Eq. (?]). Using the corresponding distribution, P(Sq), we define a probability 
Po of finding the system within a small neighborhood of the global minimum. We take 

Po= I P{5l)d§l (12) 
Jo 

with the parameter A = 0.04; this choice of A is motivated by the Sq distributions (cf Figs. |l] a 
and b). Let us stress that the distribution P(5 2 ) is different from the distribution P{5 2 ) introduced 
earlier; P(6 2 ) measures general fluctuations rather than deviations from a given state. 

The degree of folding may be defined in terms of the quantity pq. Alternatively, it may be defined 

as 

77° 

Q = — (13) 
n 
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no. 


sequence 


T s 


nmax 


Tf 


1 


BAAA AAAB AAAA BAAB AABB 


0.361 


51.4 


< 0.15 


2 


BAAB AAAA BAB A ABAA AAAB 


0.319 


55.9 


< 0.15 


3 


AAAA BBAA AABA ABAA ABBA 


0.298 


65.8 


0.23 


4 


AAAA BAAB ABAA BBAA ABAA 


0.273 


61.5 


0.22 


5 


BAAB BAAA BBBA BAB A ABAB 


0.327 


49.8 


< 0.15 


6 


AAAB BABB ABAB BAB A BAB A 


0.257 


62.6 


0.15 



Table 1: The six sequences studied. The errors in T 8 and C™ ax are less than 0.005 and 0.3 respec- 
tively. The errors in Tf are approximately 0.02. 



where n° is the number of occupied native contacts, and n is the total number of native contacts. 
Two monomers i and j are taken to be in contact if r£ < 1.75; this cutoff is motivated by the 
distribution P(rf,-) (not displayed in this paper). We have Q = 1 for the native structure. Also, 
it is useful to divide the set of native contacts into local and global contacts. A contact between 
monomers i and j will be called local if 2 < \i — j\ < 4 and global if \i — j\ > 4. We set Qi = n°/ni, 
where n° is the number of occupied local native contacts and ni is the total number of local native 
contacts. Similarly, we define Q g as the fraction of occupied global native contacts. 



5 Results 



The model is defined by two parameters, K\ and k 2 , which set the strengths of the local interactions. 
In order to investigate the importance of these interactions, we first performed preliminary runs for 
a number of different («i,At2) values. In particular, these explorations aimed at establishing the 
balance between the local and global interactions — the overall scale of («i,K2)- More extensive 
simulations were then carried out for three different choices 



(0,0) 

(-1,0) 

(-1,0.5) 



(14) 



using sequences of length N = 20. It turns out that shorter chains exhibit less interesting and 
discriminative behavior. The sequences, which were deliberately chosen to represent a variety of 
behavior, are listed in Table (j} 

In this section we first compare the behavior of the model at the three values of k 2 ) with respect 
to the local interactions and low T properties in order to single out one set of (ki, k%) values. Wc 
then examine the folding properties and thermodynamic behavior for (ki,kq) = (—1,0.5) in some 
detail. 
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Figure 5: Bond (tj) and torsional (c^) angle distributions at T = 0.15 for the six simulated sequences, 
(a) pure Lennard-Jones potential (/«i,«2) = (0,0) and (b) (Ki,«a) = (—1,0.5). The potential 
corresponding to (ki, K2) = (—1, 0) yields a similar distribution as in (b). 



5.1 The Local Interactions 



Prior to investigating folding properties for different choices of (ki,^), we compare the bond and 
torsional angle distributions and the local correlations of the chains with those of functional proteins. 
In Figs. U and ^| the model counterparts of Figs. and Hare shown. The data in Fig. || were obtained 
at a fixed temperature, T = 0.15, while those in FigTg were obtained using quenched low energy 
structures. In both cases we expect the data to reflect the behavior for a wide range of not too high 
temperatures. 

The results strongly indicate the need for local interactions when it comes to mimicking functional 
proteins; the strong regularities in the local structure observed for proteins are to a large extent 
missing for a pure Lennard-Jones potential. This can be seen from the torsional angle distribution 
and the correlation function Cb(d). The torsional angle cti varies over the whole range of 360°, 
without any strongly suppressed regions, and the correlation Cb(d) is very weak for all d > 0. The 
conclusion that local correlations are weak for a pure Lennard-Jones potential is in good agreement 
with the findings of Ref. |l(J . 

From the Figs, ^jand^it is also clear that changing k\ from to -1 leads to a significant improvement; 
the torsional angle distribution becomes concentrated to a few relatively small regions, and local 
correlations become stronger. The range of the correlation Cb(d) is for «i = — 1 comparable to what 
it is for proteins. Needless to say, the model is not intended to reproduce the precise form of the 
correlations. However, it is encouraging that the qualitative behavior of this very simple model is 
consistent with the one from functional proteins. 

The remaining question is whether a non-zero K2 is called for. No conclusive evidence can be drawn 
from Figs. |5| and ^ alone in this respect, although one may argue that the range of the correlation 
Cb{d) is still somewhat short for (ki,K2) = (—1,0). 
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Figure 6: The correlation function Cb(d) (Eq. ||) for 3D model chains; (ki, n 2 ) = (0fi) (o), (-1,0) (□) 
and (-1,0.5) (x). 

Next we investigate how the choice of parameters affects the key folding property identified in 
Ref. (J] — the S 2 distribution should be peaked for low 5 2 . In Fig. [?], P(S 2 ) is shown for sequence 4 
in Table [|. In contrast to the local properties discussed above, P{5 2 ) is strongly sequence dependent. 
Nevertheless, Fig. [?] illustrates some general trends seen in the data. First, the pure Lennard-Jones 
interaction yields a very broad distribution of <5 2 , implying that folding properties are poor. This 
is true for all the six sequences studied. Second, although the behavior is different for one of the 
six sequences, the k 2 7^ choice appears to have a distinct edge over the one ignoring the torsional 
angle interaction. 

In summary, our results show that local interactions are essential in order to get a regular local 
structure and structural stability. When comparing the results for (ki,/?2) = (—1,0) and (-1,0.5), 
we find that the structural stability tends to be highest for (k 1; k 2 ) = (—1, 0.5). In what follows we 
will focus on this choice of parameters for the six sequences in Table fil. 



5.2 2D Revisited 

In Refs. [0, 0], a similar model was studied in 2D using somewhat different Lennard-Jones 
parameters and local interactions corresponding to (k±, K2)=(\, 0) (cf Eq. |l|). Here, we briefly 
discuss the importance of local interactions for this 2D model. 

For this purpose, we leave out the local interaction term, i.e. we set (/ti, «2)=(0, 0). Using this 
pure Lennard-Jones potential, we redo the simulations for 15 sequences. At T = 0.15, we find a 
strong correlation between the (<5 2 )'s for the two types of potentials; thus (S 2 ) varies widely with 
sequence even for the (fti, «2)=(0, 0) potential. This is in contrast to our findings in 3D, where all 
the sequences studied have a large (S 2 ) for («i, k 2 )=(0, 0). 
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Figure 7: P(S 2 ) at T = 0.15 for sequence 4 (cf Table 1); (ki,k 2 )=(0,0) (dots), (-1,0) (dashes), and 
(-1,0.5) (solid). The low-<5 2 peak for («i,«2)=(-lj0.5) extends to 37. 

5.3 Folding Properties 

When investigating the folding properties of the chains we focus on the thermodynamics. This is 
in part inspired by the results from Ref. Q], where the thermodynamic properties exhibited strong 
sequence dependence. Initially we examine the various thermodynamic quantities defined in Sec. 4.2 
over the entire probed T range. Then we proceed with the "magnifying glass" to the 0.15< T <0.50 
region, where the different chains exhibit strongest difference in behavior and the folding properties 
can be studied in some detail. 



5.3.1 Thermodynamic Behavior 

In Fig. H the behavior of the different thermodynamic quantities are shown over the entire T range. 
The overall size of the molecule, as measured by r 2 yr , decreases substantially when T decreases from 
3 to 0.15, as can be seen from Fig. g. This decrease is gradual. The data points essentially fall 
onto two different curves. The upper curve corresponds to the sequences 5 and 6 with composition 
10A+10B, and the lower curve to the sequences 1-4 with composition 14A+6B. 

Next we turn to (<5 2 ), which measures the size of the fluctuations. In Fig. || we show the relative 
magnitude (S 2 )/r 2 r . This ratio exhibits a peak slightly above T = 1, and is approximately sequence 
independent down to T w 0.4; above this temperature (5 2 ) shows a sequence (composition) depen- 
dence similar to that of r 2 yi . Below T w 0.4 the situation is different. Here (5 2 ) is strongly sequence 
dependent, in contrast to r 2 yi . In Fig. ^| the results are shown for (5 2 ) in the low-T region. 



11 




12 





In order to understand the low-T behavior it is useful to consider po, which measures the relative 
population of the lowest energy state (Eq. |l^). A comparison of the data for (S 2 ) and pq shows 
that small (5 2 ) values are associated with large po values. Therefore, it is reasonable to define the 
folding temperature Tf as the temperature where po — 1/2. Estimates of Tf are given in Table 1. 
For three of the six sequences Tf is smaller than 0.15, as can be seen from Fig. |lO|a. It should be 
stressed that the shape of the molecule is very compact already above Tf. This will be discussed in 
more detail below for one of the sequences. 

From Figs. || and |Io| b it can be seen that the specific heat exhibits a peak in the low-T region, 
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slightly above Tf. The height, Cy ax , and location, T s , of the peak are given in Table 1. We find 
that Cy ax is large for the sequences with high Tf , which is in accordance with the results from the 
lattice model study of Ref. ||. However, our results for T s are somewhat different from those of 
Ref. ||. In our model T s is relatively low for sequences with high Tf, while the results reported 
by Ref. § show the opposite behavior. Thus, the separation T s — Tf exhibits a stronger sequence 
dependence in our model. 

Also shown in Fig. || is the T dependence for the occupancy of local and global native contacts. 
In the vicinity of Tf, Qi is large and varies slowly, whereas Q g changes rapidly. In particular, this 
demonstrates that the formation of local native structure, like the compactification, is a gradual 
process that to a large extent takes place above Tf. 



5.3.2 The Folding Transition Region 

In this subsection we study the behavior at the folding temperature Tf, where po = 1/2, in some 
more detail. Two different sequences are considered, 4 and 6. Sequence 4 has a fairly high Tf value, 
whereas sequence 6 represents a more typical Tf. 

In Figs, [n] a and b we show histograms of Sq, the mean-square distance to the lowest energy state, 
near T = Tf. Both histograms exhibit a multi-peak structure, with a narrow peak at low <5§ that 
corresponds to the native state. The thermodynamic weight of the native state is, by definition, 
approximately 50% in both cases. However, the distributions differ in shape, which is important 
from the viewpoint of kinetics. From Figs. |ll]a and b one concludes that the Sq distribution is much 
more rugged for sequence 6 than for sequence 4. This suggests that, at T — Tf, folding is fastest 
for sequence 4, the sequence with highest Tf. We also explored using Q rather than Sq as reaction 
coordinate, with similar results. 

The Q distribution at Tf has been studied previously by Ref. J?]], using a lattice model. Here the 
distributions for a folding and a non-folding sequence were compared, and these were found to be 
almost identical. This may seem to contradict our findings, and suggest that there is a difference 
between the two models. However, it should be remembered that Ref. did not study the full Q 
distribution, but rather the Q distribution corresponding to maximally compact structures only. 

In Figs, [ll] c and d we show the probability distributions of E and r 2 yl at T « Tf for sequence 4. Also 
shown are the contributions to these distributions from conformations with Sq < 0.04 and Sq > 0.04, 
respectively. This corresponds to a simplified two-state picture where each conformation is classified 
as either native- like or not. The shape of the Sq distribution shows that such a classification is feasible 
in an essentially unambiguous way. 

As one might have expected from the sharpness of the peak in the specific heat, Fig. [ll] c shows 
that these two "states" differ significantly in energy, although the total E distribution is unimodal. 
However, the average size of the molecule is very similar for the two states. This fact clearly 
demonstrates that the compactification occurs prior to the dominance of the native state setting in. 

These results may be compared with those of Ref. || , where a detailed study of the behavior at Tf 
was carried out for a lattice model. Here the total number of contacts, C, rather than r 2 was used 
as a measure of compactness. In contrast to the results shown in Figs. |11] c and d, the probability 
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Figure 11: Histograms at T » T f for sequence 4 (T = 0.22) and 6 (T = 0.15). (a) S% for seq. 4. 
(b) Sq for seq. 6. (c) E for seq. 4. (d) rj yr for seq. 4. The dashed and dotted lines in (c) and (d) 
represent the contributions corresponding to Sq < 0.04 and 6% > 0.04, respectively. 



distributions of both E and C were found to be bimodal for a sequence with high Tf. This may 
reflect a genuine difference between lattice and off-lattice models. 



6 Summary 



We have extended the off-lattice protein model of Ref. ||] to three dimensions. In doing so, one 
key issue has been the choice of species-independent local interactions that balance the species- 
dependent non-bonded Lennard- Jones interactions. The local bond and torsional angle interactions 
were chosen to satisfy two criteria: 



15 



• The resulting low temperature configurations should at least qualitatively reproduce the local 
angle distributions and correlations of functional proteins. 

• It should be possible to produce good folders; i.e. there should be sequences with thermody- 
namically stable structures at not too low temperatures. 

It turns out that the presence of local interactions is necessary for satisfying these criteria. Among 
the two local interactions, the bond angle one is the most crucial one in this respect. 

After having specified the interaction to meet these requirements we study the thermodynamic 
behavior for six different sequences. The following generic behavior emerges: 

• As the temperature is decreased, a gradual compactification takes place. 

• In the compact state a sequence dependent folding transition occurs, where the good folding 
sequences are characterized by a higher folding temperature. In terms of the specific heat, 
these good folders also have more pronounced peaks. 

• In the state from which the transition to the native state occurs, a large fraction of the native 
contacts are already formed. The contacts missing are mainly those corresponding to large 
topological distance along the chain. 

This picture is consistent with what is observed for lattice models [jj], |) . A few minor, but significant, 
differences should be mentioned though. 

• We do not observe bimodal distributions of energy or compactness as in Ref. [||. In a sim- 
plified two-state picture, the two coexisting states at the folding temperature do correspond 
to different energy distributions, but the overlap between the two is large enough to blur this 
effect. 

• For the sequences studied, high folding temperature is accompanied by a relatively low tem- 
perature for the peak of the specific heat. This is in contrast to what was reported in Ref. ||. 

Finally, we stress the fact that we have studied sequences with only two types of residues. In 
Refs. ||, |(| a number of binary (two-letter code) sequences were studied in a 3D lattice model. The 
two-letter code was found to be insufficient in the sense that these sequences did not have unique 
native structures. In our model, many binary sequences do have unique native structures. 

Our ability to map out the thermodynamics of the 3D off-lattice model, relies heavily upon the 
efficiency of the dynamical-parameter algorithm of Refs. j| ||. As it stands, the results for each 
N = 20 chain require 70 CPU hours on an Alpha DecStation 200. We feel confident that additional 
algorithmic efficiency improvements can be made, which will enable us to probe longer and more 
3D chains than reported in this work. 
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